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INTRODUCTION 


A  western  lifestyle,  characterized  by  low  rates  of  energy  expenditure  and  a  high-energy  diet  rich  in 
saturated  fats  and  refined  carbohydrates,  is  associated  with  high  incidence  of  breast  cancer  in  women. 
This  type  of  lifestyle  induces  storage  of  excess  energy  in  the  form  of  triglycerides,  produced  either  from 
the  diet  fatty  acids  or  from  those  synthesized  de  novo.  Excess  energy  intake  and  obesity  also  cause 
insulin  resistance,  which  is  associated  with  elevated  blood  levels  of  glucose  and  insulin,  factors  that 
induce  fatty  acid  synthesis  in  different  tissues  and  which  have  been  implicated  in  the  etiology  of  various 
cancer  types  including  that  of  the  breast 1'12. 

Several  studies  have  demonstrated  high  levels  of  key  fatty  acid  synthesis  enzymes  -  fatty  acid  synthase 
(FAS)  and  acetyl-CoA  carboxylase  alpha  (ACCalpha)  -  in  human  breast  cancer  as  well  as  in  other  tumor 
types  13'16.  FAS  inhibitors  have  been  shown  to  delay  tumor  progression  in  xenograft  breast  cancer 
models  and  to  induce  apoptosis  of  breast  carcinoma  cells  17'23.  We  recently  discovered  a  highly  specific 
interaction  between  ACCalpha  and  the  protein  coded  by  the  breast  cancer  susceptibility  gene  BRCA1  24, 
which  further  supports  a  possible  central  role  of  lipogenic  enzymes  in  breast  cancer  development. 

The  above  observations  lead  us  to  hypothesize  that  genes  involved  in  cellular  fatty  acid  synthesis  may  be 
centrally  implicated  in  mammary  gland  carcinogenesis  and  that  polymorphic  alleles  that  increase  the 
expression  or  activity  of  these  genes  confer  increased  breast  cancer  susceptibility.  The  specific  aims  of 
the  proposed  study  are: 

■  to  search  exhaustively  for  sequence  variations  in  seven  selected  genes  coding  for  key  lipogenic 
enzymes  ( ACCalpha ,  FAS)  and  their  principal  regulatory  factors  ( AMPKalphal ,  AMPKalpha2, 
ChREBP,  SREBP1,  NFYA); 

■  to  examine  associations  of  these  sequence  variations  with  breast  cancer  risk,  using  a  large  case- 
control  study  nested  within  the  European  Prospective  Investigation  into  Cancer  and  Nutrition  (EPIC)  - 
a  prospective  cohort  in  ten  Western  European  countries;  and 

■  to  examine  interactions  between  the  genetic  variants  and  lifestyle  factors  such  as  excess  weight  and 
estimated  intakes  of  different  types  of  fats,  in  determining  breast  cancer  risk. 
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BODY 


The  following  accomplishments  of  the  tasks  of  the  approved  Statement  of  Work  applicable  to  the  first 
year  of  the  award  have  been  achieved: 

Task  1 :  Selection  of  cases  and  controls,  using  the  established  eligibility  and  matching  criteria,  and 
extraction  of  a  database  with  relevant  information  from  questionnaires  and  anthropometry 
(Months  1-4). 

A  total  of  2510  incident  cases  of  breast  cancer  were  identified,  with  blood  samples  taken  before  cancer 
diagnosis.  To  these  cases,  a  total  of  3636  control  subjects  were  matched.  Matching  factors  were  age  at 
blood  donation,  EPIC  study  center  of  recruitment  into  the  cohort,  menopausal  status  at  blood  donation, 
and  phase  of  menstrual  cycle  (for  premenopausal  women).  By  the  end  of  2005,  with  a  next  round  of 
follow-up  to  identify  further  incident  breast  cancer  cases,  we  plan  to  extend  the  numbers  of  cases  and 
controls  to  about  3000  incident  breast  cancer  cases,  and  about  4000  matched  control  subjects. 


Task  2:  Retrieval  ofbuffy  coat  samples  from  the  central  EPIC  storage  facility,  and  completion  of  DNA 
extraction  (for  ~ 1000  cases  and  ~ 1000  controls  for  whom  DNA  has  not  been  extracted  yet); 
preparation  of  microwell  plates  with  DNA  samples,  to  be  ready  for  PCR  (Months  2-12). 

For  a  total  of  1719  cases  of  breast  cancer  and  2844  control  subjects,  DNA  was  extracted  from  buffy  coat 
samples,  as  part  of  a  previous  project.  For  the  additional  791  cases  currently  identified,  and  their  791 
matched  controls,  all  with  blood  samples  stored  at  the  central  biorepository  at  I  ARC,  DNA  extraction  is 
currently  ongoing. 


Task  3:  Exhaustive  SNP  discovery  in  all  candidate  genes  by  resequencing  of  DNA  (exons  and  potential 
regulatory  elements)  from  46  breast  cancer  patients  (Months  1-24). 

During  2004,  we  completed  the  laboratory  development  steps  required  to  enable  high-throughput 
resequencing  at  IARC.  One  important  milestone  was  programming  of  a  Laboratory  Information 
Management  System  (LIMS)  that  can  track  the  flow  of  samples  and  data  in  large  scale  moderately 
automated  projects.  The  other  milestone  was  development  of  the  automated  lab  process  for 
resequencing,  using  dye-primer  sequencing  chemistry.  The  dye-primer  chemistry  has  the  advantage  of 
giving  more  even  peak  heights  than  dye-terminator  chemistry,  thus  making  it  more  appropriate  for 
heterozygote  detection.  In  addition,  the  chemistry  for  dye-primer  sequencing  is  more  cost  effective  than 
for  dye-terminator.  We  improved  upon  the  publicly  available  primer  selection  software  Primer3 
<http://www-genome.wi.mit.edu/genome_software/other/primer3.html.>  by  incorporating  automatic 
addition  of  Ml 3  tails,  more  robust  anti-hairpin  protection,  and  more  robust  anti-primer  dimer  protection. 

The  gene  management  and  resequencing  workflow  can  be  viewed  as  a  five  step  process: 

1.  Bio-informatic  analysis  of  gene  structure: 

The  genomic  structure  of  the  5'  UTR,  3'  UTR,  and  coding  regions  of  the  six  selected  candidate 
genes  (FAS,  AMPKalphal,  AMPKalpha2,  ChREBP,  SREBP1,  NFYA)  was  assessed  using  data 
from  3  public  databases: 

■  the  HUGO  Gene  Nomenclature  website  <http://www.gene.ucl.ac.uk/nomenclature/>, 

■  the  UC  Santa  Cruz  genome  browser  <http://genome.ucsc.edu/>,  and 

■  PubMed/  Genbank  <http://www.ncbi. nlm.nih.gov/entrez/querv. fcqi?db=PubMed> 

(Table  1). 

Genomic  and  cDNA  sequences  were  downloaded  and  positions  of  all  of  the  splice  junctions  were 
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confirmed  (Appendix  1).  We  had  already  screened  a  7th  gene  that  is  relevant  to  this  study, 
ACCalpha,  for  sequence  variants  in  a  series  of  49  breast  cancer  familial  cases  in  our  previous 
study  (ref  25). 

2.  Primer  design: 

Once  we  are  confident  of  gene  structure,  Ml  3  tailed  primers  are  designed.  We  have  completed 
design  of  primer  pairs  sufficient  to  PCR  amplify  all  of  the  exons  of  each  of  the  6  genes  that  we 
have  proposed  to  resequence.  Primers  for  3  of  the  genes,  AMPKalphal  (PRKAA1),  AMPKalpha2 
(PRKAA2),  and  ChREBP  (WBSCR14),  were  ordered  and  tested  in  late  2004.  Primers  for  the 
fourth  gene,  FAS  (which  has  by  far  the  most  exons  of  the  genes  that  we  need  to  resequence) 
were  ordered  in  January  2005. 

3.  Selection  of  a  DNA  sample  set  fo  systematic  reseauencing : 

Resequencing  is  being  carried  on  DNA  samples  from  lymphoblastoid  cell  lines,  established  from 
46  high-risk  breast  cancer  cases,  1  chimpanzee  lymphoblastoid  line,  and  1  negative  control.  The 
breast  cancer  cases  were  all  chosen  from  differenty  families,  and  been  previously  screened  for, 
and  found  not  to  carry,  clearly  deleterious  mutations  in  BRCA1  or  BRCA2.  All  of  the  cell  lines 
were  cultured  and  DNA  preps  prepared  by  September  2004. 

The  6  genes  that  we  are  resequencing  are  comprised  of  a  total  of  110  exons.  Resequencing  of 
the  exons  from  exon  2  through  the  3'  UTR  does  not  require  any  analysis  beyond  steps  1  and  2 
above.  However,  on  the  other  hand,  resequencing  of  exon  1  and  the  proximal  promoter  of  each 
gene  benefits  from  a  comparative  genomics  analysis.  Therefore,  we  describe  these  two 
processes  separately. 

4.  Standard  exon  reseauencing: 

Our  capillary  sequencer,  a  96-capillary  Spectrumedix  9610,  was  delivered  in  September  2004. 
We  completed  sequencing  process  development  in  November  2004.  In  December  we  began 
resequencing  3  of  the  genes  under  study,  AMPKalphal  (PRKAA1),  AMPKalpha2  (PRKAA2),  and 
ChREBP  (WBSCR14).  As  of  the  middle  of  February,  we  had  completed  the  standard  exon 
resequencing  for  AMPKalphal  and  AMPKalpha2;  5  standard  exons  of  ChREBP  remain  to  be 
done.  Between  these  three  genes,  we  have  completed  the  analysis  of  29  exons.  Thus  in  -2.5 
months  we  executed  just  over  25%  of  the  total  resequencing  (including  the  exon  1  and  proximal 
promoters).  Our  resequencing  results  are  summarized  in  Appendices  2.1  to  2.3. 

5.  Sequencing  of  proximal  promoter,  and  other  regulatory  elements: 

Part  of  our  goal  is  to  determine  if  there  are  sequence  variants  in  these  gene's  transcriptional 
regulatory  elements  that  might  alter  gene  expression.  In  order  to  do  this,  we  are  taking  a 
comparative  genomics  approach  to  identification  of  the  proximal  promoter  and  other  potential 
transcriptional  regulatory  elements.  For  each  gene  of  interest,  we  make  a  nucleotide  multiple 
sequence  alignment  covering  from  ~1 0,000  bp  upstream  of  exon  1  all  the  way  to  exon  2.  The 
alignment  includes  the  human  sequence  and  orthologous  sequences  from  at  least  3  different 
orders  of  Mammals:  rodents  (mouse  or  rat,  whichever  genomic  sequence  is  more  complete 
across  the  region  of  interest),  carnivores  (dog),  artiodactyls  (cow)  and  marsupials  (opossum).  In 
fact,  the  release  in  January  2005  of  apparently  high-quality  cow  and  opossum  genome  sequence 
assemblies  is  making  our  approach  more  robust  than  it  might  otherwise  have  been.  In  this 
approach,  potential  transcriptional  regulatory  elements  are  recognized  as  non-exon  sequences 
that  are  conserved  across  at  least  4  of  the  5  sequences  represented.  By  this  approach,  we 
generally  see  a  proximal  promoter  that  extends  -200  bp  upstream  (or  sometimes  downstream)  of 
exon  1  and  3-4  other  conserved  sequence  elements  of  50-200  bp  that  merit  resequencing. 

At  the  rate  of  progress  we  have  been  making  since  the  beginning  of  December  2004,  we  expect  to  finish 

the  resequencing  phase  of  this  project  on  schedule,  before  the  end  of  2005. 
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Table  1.  Selected  candidate  genes 


Gene  name 

Gene  product 

Chromosome 

Band 

Genomic 

Size 

Exons 

SNP 

discovery 

ACCalpha 

Acetyl-CoA  carboxylase  alpha 

17q12 

324,977  bp 

60 

done 

FAS 

Fatty  acid  synthase 

17q25.3 

19,893  bp 

43 

to  be  done 

AMPKalphal 

AMP-activated  protein  kinase  alpha  1  catalytic  subunit 

5p1 3.1 

38,809  bp 

11 

done 

AMPKalpha2 

AMP-activated  protein  kinase  alpha  2  catalytic  subunit 

1p32.2 

63,103  bp 

9 

done 

SREBP1 

Sterol  regulatory  element  binding  protein  type  1 

17p1 1.2 

24,941  bp 

20 

to  be  done 

NFYA 

CCAAT-binding  factor/nuclear  factor-Y 

6p21.1 

26,027  bp 

10 

to  be  done 

ChREBP 

Carbohydrate  response  element  binding  protein 

7q1 1.23 

31,347bp 

17 

done 

Task  4:  Determination  of  haplotypes  and  haplotype-tagging  SNPs,  using  specialized  software 
(Months  3-24). 

Within  an  exhaustive  list  of  polymorphisms  in  all  coding  and  regulatory  regions  of  the  selected  candidate 
genes  it  should  in  principle  be  possible  to  identify  disease-causing  polymorphisms  directly,  by  e.g. 
multivariate  regression  modeling.  Certain  polymorphisms,  however,  particularly  in  regulatory  sequences, 
may  be  missed.  A  risk-associated  haplotype  might  indicate  the  presence  of  a  causal,  yet  to  be  identified, 
polymorphism  that  is  in  linkage  disequilibrium  with  this  haplotype. 

The  haplotypes  of  the  genes  AMPKalphal,  AMPKalpha2  and  ChREBP  -  almost  fully  sequenced  in  the 
46  subjects  -  have  been  assessed.  The  software  PHASE  26  is  used  for  estimation  of  haplotypes  and  the 
algorithm  described  by  Stram  et  al 27  is  used  for  identification  of  haplotype-tagging  SNPs. 

In  a  preliminary  study  of  the  ACCalpha  gene  in  453  breast  cancer  cases  and  469  control  subjects  from 
France  we  found  significant  associations  of  breast  cancer  risk  with  four  common  ACCalpha  haplotypes 
(ref  25).  As  part  of  the  current  project,  we  typed  the  same  four  haplotype-tagging  SNPs  as  in  our  previous 
study,  for  1719  breast  cancer  cases  and  2844  matched  controls  within  the  EPIC  study.  Results  of  this 
first  analysis  within  EPIC,  did  not  show  the  associations  observed  in  our  previous  study. 

We  hypothesized  that  our  initial,  positive  results  could  have  been  due  to  linkage  disequilibrium  between 
the  ACCalpha  assessed  risk  haplotypes  and  some  yet  undetected  causal  noncoding  variants.  Using  the 
program  “Haploview” 28,  we  have  therefore  started  examining  in  greater  depth  the  haplotypes,  composed 
of  the  SNPs  spanning  the  ACCalpha  gene  region,  as  documented  in  the  HapMap  project 
(http://www.hapmap.org).  The  algorithm  of  Gabriel  et  al  29  was  used  for  the  definition  of  linkage 
disequilibrium  blocks.  Haplotypes  were  estimated  using  an  accelerated  EM  algorithm  similar  to  the 
partition/ligation  method  described  in  Qin  et  al  30.  This  analysis  led  to  the  identification  of  7  additional 
htSNPs,  that  are  currently  (February-March  2005)  also  being  typed  for  the  1719  breast  cancer  cases  and 
2844  control  subjects  from  EPIC.  We  plan  to  extend  this  analysis  also  to  a  total  of  about  3000  breast 
cancer  cases,  and  over  3000  matched  controls,  within  the  EPIC  cohorts. 
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KEY  RESEARCH  ACCOMPLISHMENTS 

The  key  research  accomplishments  emanating  from  the  research  performed  during  the  first  year  of  the 
award  are: 

1.  Discovery  of  sequence  variations  through  the  exhaustive  resequencing  of  the  coding  and 
regulatory  regions  of  the  AMPKalphal,  AMPKalpha2  and  ChREBP  genes  coding  for  principal 
regulatory  factors  of  key  lipogenic  enzymes  (Appendix  1.  Appendices  2.1 -2.3): 

2.  Assessement  of  haplotypes  and  selection  of  haplotype-tagging  SNPs  in  the  ACCalpha, 
AMPKalphal ,  AMPKalpha2  and  ChREBP  genes  to  be  examined  for  association  with  breast 
cancer  risk,  using  a  case-control  study  nested  within  the  EPIC; 

3.  Selection  of  a  series  of  over  2500  breast  cancer  cases  and  over  3500  control  subjects  within  the 
EPIC  cohorts,  for  genotyping  of  haplotype-tagging  SNPs  in  the  selected  candidate  genes,  and 
analysis  of  gene-breast  cancer  associations. 


REPORTABLE  OUTCOMES 

None,  for  year  1 . 


CONCLUSIONS 

Our  project  is  well  on  schedule,  for  the  identification  of  SNPs  in  our  candidate  genes  (systematic 
resequencing),  selection  of  haplotype-tagging  SNPs,  selection  of  case  and  control  subjects  for  nested 
case-control  analysis  within  the  EPIC  cohorts,  and  extraction  of  DNA  from  buffy  coat  samples  of  cases 
and  controls. 
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APPENDICES 


Appendix  1:  Genomic  structure  of  the  candidate  genes  examined 
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Appendix  2.1 

AMPKalphal/PRKMI:  Summary  resequencing  results. 

PRKAA1  is  a  gene  with  9  commonly  used  exons  and  2  alternative  exons.  To  date,  we  have  resequenced  8  of  these. 
Resequencing  of  exons  1.7.8  and  the  proximal  promoter  is  in  progress 
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Appendix  2.2 

AMPKalpha2/PRKM2:  Summary  resequencing  results. 

PRKAA2  is  a  gene  of  9  exons  Resequencing  of  8  of  its  exons  (exons  2  -  9)  is  complete 
Resequencing  of  exon  1  and  the  proximal  promoter  is  in  progress 
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Appendix  2.3 

CHREBP/WBSCR14:  Summary  resequencing  results. 


WBSCR14  is  a  gene  of  15  exons  Resequencing  of  11  of  its  exons  (exons  2  -  8.  12-15)  is  complete. 
Resequencing  of  exon  1  and  the  proximal  promoter,  as  well  as  exons  9-11,  are  in  progress 
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